ICECUBE是一种用于检测1 GEV和1 PEV之间大气和天体中微子的光学传感器的立方公斤阵列,该阵列已部署1.45 km至2.45 km的南极的冰盖表面以下1.45 km至2.45 km。来自ICE探测器的事件的分类和重建在ICeCube数据分析中起着核心作用。重建和分类事件是一个挑战,这是由于探测器的几何形状,不均匀的散射和冰中光的吸收,并且低于100 GEV的光,每个事件产生的信号光子数量相对较少。为了应对这一挑战,可以将ICECUBE事件表示为点云图形,并将图形神经网络(GNN)作为分类和重建方法。 GNN能够将中微子事件与宇宙射线背景区分开,对不同的中微子事件类型进行分类,并重建沉积的能量,方向和相互作用顶点。基于仿真,我们提供了1-100 GEV能量范围的比较与当前ICECUBE分析中使用的当前最新最大似然技术,包括已知系统不确定性的影响。对于中微子事件分类,与当前的IceCube方法相比,GNN以固定的假阳性速率(FPR)提高了信号效率的18%。另外,GNN在固定信号效率下将FPR的降低超过8(低于半百分比)。对于能源,方向和相互作用顶点的重建,与当前最大似然技术相比,分辨率平均提高了13%-20%。当在GPU上运行时,GNN能够以几乎是2.7 kHz的中位数ICECUBE触发速率的速率处理ICECUBE事件,这打开了在在线搜索瞬态事件中使用低能量中微子的可能性。
translated by 谷歌翻译
te reo m \ = aori(称为m \ = aori),新西兰的土著语言在语言技术中的资源不足。 m \ = aori扬声器是双语的,其中m \ = aori用英语进行了代码开关。不幸的是,M \ = AORI语言技术,语言检测和M \ = Aori-English对之间的代码转换检测的资源最少。英语和M \ = AORI都使用罗马衍生的拼字法制作基于规则的系统来检测语言和代码转换限制性。大多数M \ = AORI语言检测是由语言专家手动完成的。这项研究构建了66,016,807个单词的Aori英语双语数据库,并带有单词级语言注释。新西兰议会汉萨德辩论报告用于构建数据库。语言标签是使用特定语言规则和专家手册注释分配的。 M \ = AORI和英语的单词具有相同的拼写,但含义不同。这些词不能根据单词级的语言规则将其归类为M \ = AORI或英语。因此,需要手动注释。还报道了报告数据库的各个方面的分析,例如元数据,逐年分析,经常出现的单词,句子长度和n-grams。这里开发的数据库是新西兰Aotearoa的未来语言和语音技术开发的宝贵工具。遵循标签数据库的方法也可以遵循其他低资源的语言对。
translated by 谷歌翻译
公平是世界各地的文明可以观察到的主要社会价值。这表明这是社会协议,通常用文本描述,例如合同。然而,尽管存在普遍性,但描述了描述社会法案的文本的公平度量仍然存在。为了解决这个问题,我们会返回基于第一个校长的问题来考虑问题。我们利用社会心理学文献而不是使用规则或模板来确定人类在进行公平评估时使用的主要因素。然后,我们尝试将这些单词嵌入式数字化为一个多维句子级公平感知向量,以用作这些公平感知的近似。该方法利用Word Embeddings内的Pro-社会偏见,我们获得F1 = 81.0。基于所述公平逼近向量的PCA和ML使用PCA和M1产生的第二种方法产生86.2的F1得分。我们详细介绍了方法中可以制作的改进,以将句子嵌入到公平性的子空间表示的投影。
translated by 谷歌翻译
灵巧的操纵仍然是机器人技术中的一个空缺问题。为了协调研究界为解决这个问题的努力,我们提出了共同的基准。我们设计和构建了机器人平台,该平台托管在MPI上供智能系统托管,可以远程访问。每个平台由三个能够敏捷物体操纵的机器人手指组成。用户能够通过提交自动执行的代码(类似于计算群集)来远程控制平台。使用此设置,i)我们举办机器人竞赛,来自世界任何地方的团队访问我们的平台以应对具有挑战性的任务ii)我们发布了在这些比赛中收集的数据集(包括数百个机器人小时),而我们为研究人员提供了访问自己项目的这些平台。
translated by 谷歌翻译
目前,由精确的径向速度(RV)观察结果受到恒星活性引入的虚假RV信号的限制。我们表明,诸如线性回归和神经网络之类的机器学习技术可以有效地从RV观测中删除活动信号(由于星形/张图引起的)。先前的工作着重于使用高斯工艺回归等建模技术仔细地过滤活性信号(例如Haywood等人,2014年)。取而代之的是,我们仅使用对光谱线平均形状的更改进行系统地删除活动信号,也没有有关收集观测值的信息。我们对模拟数据(使用SOAP 2.0软件生成; Dumusque等人,2014年生成)和从Harps-N太阳能望远镜(Dumusque等,2015; Phillips等人2015; 2016; Collier训练)培训了机器学习模型。 Cameron等人2019)。我们发现,这些技术可以从模拟数据(将RV散射从82 cm/s提高到3 cm/s)以及从HARPS-N太阳能望远镜中几乎每天进行的600多种真实观察结果来预测和消除恒星活动(将RV散射从82 cm/s提高到3 cm/s)。 (将RV散射从1.753 m/s提高到1.039 m/s,提高了约1.7倍)。将来,这些或类似的技术可能会从太阳系以外的恒星观察中去除活动信号,并最终有助于检测到阳光状恒星周围可居住的区域质量系外行星。
translated by 谷歌翻译
We demonstrate a proof-of-concept of a large language model conducting corporate lobbying related activities. We use an autoregressive large language model (OpenAI's text-davinci-003) to determine if proposed U.S. Congressional bills are relevant to specific public companies and provide explanations and confidence levels. For the bills the model deems as relevant, the model drafts a letter to the sponsor of the bill in an attempt to persuade the congressperson to make changes to the proposed legislation. We use hundreds of ground-truth labels of the relevance of a bill to a company to benchmark the performance of the model, which outperforms the baseline of predicting the most common outcome of irrelevance. However, we test the ability to determine the relevance of a bill with the previous OpenAI GPT-3 model (text-davinci-002), which was state-of-the-art on many language tasks until text-davinci-003 was released on November 28, 2022. The performance of text-davinci-002 is worse than simply always predicting that a bill is irrelevant to a company. These results suggest that, as large language models continue to improve core natural language understanding capabilities, performance on corporate lobbying related tasks will continue to improve. We then discuss why this could be problematic for societal-AI alignment.
translated by 谷歌翻译
In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
translated by 谷歌翻译
In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Research on automated essay scoring has become increasing important because it serves as a method for evaluating students' written-responses at scale. Scalable methods for scoring written responses are needed as students migrate to online learning environments resulting in the need to evaluate large numbers of written-response assessments. The purpose of this study is to describe and evaluate three active learning methods than can be used to minimize the number of essays that must be scored by human raters while still providing the data needed to train a modern automated essay scoring system. The three active learning methods are the uncertainty-based, the topological-based, and the hybrid method. These three methods were used to select essays included as part of the Automated Student Assessment Prize competition that were then classified using a scoring model that was training with the bidirectional encoder representations from transformer language model. All three active learning methods produced strong results, with the topological-based method producing the most efficient classification. Growth rate accuracy was also evaluated. The active learning methods produced different levels of efficiency under different sample size allocations but, overall, all three methods were highly efficient and produced classifications that were similar to one another.
translated by 谷歌翻译